# Sources for the English Dataset

You will need to download three corpora:

- [Gutenberg Dataset](https://web.eecs.umich.edu/~lahiri/gutenberg_dataset.html) and add it in this folder (the path should be `rawTexts/english/Gutenberg`). Download the updated version from August 17, 2018.

- [Shakespeare Corpus](https://lexically.net/wordsmith/support/shakespeare.html) and add it in this folder (the path should be `rawTexts/english/ShakespearePlaysPlus`).

- [The Middle English Texts]() and add it in this folder (the path should be `rawTexts/english/ME`). The link for this will be added when/if I am able to provide it with the appropriate rights. All texts were sourced from the [Teams Middle English Texts Series](https://d.lib.rochester.edu/teams) with permission from Russell Peck.
